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In  most  cases,  Escherichia  coli  exists  as  a  harmless  commensal  organism,  but  it  may  on  occasion  cause 
intestinal  and/or  extraintestinal  disease.  Enterotoxigenic  E.  coli  (ETEC)  is  the  predominant  cause  of  E. 
co/i-mediated  diarrhea  in  the  developing  world  and  is  responsible  for  a  significant  portion  of  pediatric  deaths. 

In  this  study,  we  determined  the  complete  genomic  sequence  of  E.  coli  H10407,  a  prototypical  strain  of 
enterotoxigenic  E.  coli,  which  reproducibly  elicits  diarrhea  in  human  volunteer  studies.  We  performed  genomic 
and  phylogenetic  comparisons  with  other  E.  coli  strains,  revealing  that  the  chromosome  is  closely  related  to 
that  of  the  nonpathogenic  commensal  strain  E.  coli  HS  and  to  those  of  the  laboratory  strains  E.  coli  K-12  and 
C.  Furthermore,  these  analyses  demonstrated  that  there  were  no  chromosomally  encoded  factors  unique  to  any 
sequenced  ETEC  strains.  Comparison  of  the  E.  coli  HI 0407  plasmids  with  those  from  several  ETEC  strains 
revealed  that  the  plasmids  had  a  mosaic  structure  but  that  several  loci  were  conserved  among  ETEC  strains. 

This  study  provides  a  genetic  context  for  the  vast  amount  of  experimental  and  epidemiological  data  that  have 
been  published. 


Current  dogma  suggests  the  Gram-negative  motile  bacte¬ 
rium  Escherichia  coli  colonizes  the  infant  gut  within  hours  of 
birth  and  establishes  itself  as  the  predominant  facultative 
anaerobe  of  the  colon  for  the  remainder  of  life  (3,  59).  While 
the  majority  of  E.  coli  strains  maintain  this  harmless  existence, 
some  strains  have  adopted  a  pathogenic  lifestyle.  Contempo- 
raiy  tenets  suggest  that  pathogenic  strains  of  E.  coli  have  ac¬ 
quired  genetic  elements  that  encode  virulence  factors  and  en¬ 
able  the  organism  to  cause  disease  (12).  The  large  repertoire  of 
virulence  factors  enables  E.  coli  to  cause  a  variety  of  clinical 
manifestations,  including  intestinal  infections  mediating  diarrhea 
and  extraintestinal  infections,  such  as  urinary  tract  infections, 
septicemia,  and  meningitis.  Based  on  clinical  manifestation  of 
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disease,  the  repertoire  of  virulence  factors,  epidemiology,  and 
phylogenetic  profiles,  the  strains  causing  intestinal  infections  can 
be  divided  into  six  separate  pathotypes,  viz.,  enteroaggregative  E. 
coli  (EAEC),  enteroinvasive  E.  coli  (EIEC),  enteropathogenic  E. 
coli  (EPEC),  enterohemorrhagic  £  coli  (EHEC),  diffuse  adher¬ 
ing  £  coli  (DAEC),  and  enterotoxigenic  £  coli  (ETEC)  (33, 
35,  39). 

ETEC  is  responsible  for  the  majority  of  £  co/i-mediated 
cases  of  human  diarrhea  worldwide.  It  is  particularly  prevalent 
among  children  in  developing  countries,  where  sanitation  and 
clean  supplies  of  drinking  water  are  inadequate,  and  in  trav¬ 
elers  to  such  regions.  It  is  estimated  that  there  are  200  million 
incidences  of  ETEC  infection  annually,  resulting  in  hundreds 
of  thousands  of  deaths  in  children  under  the  age  of  5  (55, 64). 
The  essential  determinants  of  ETEC  virulence  are  traditionally 
considered  to  be  colonization  of  the  host  small-intestinal  epi¬ 
thelium  via  plasmid-encoded  colonization  factors  (CFs)  and 
subsequent  release  of  plasmid-encoded  heat-stable  (ST) 
and/or  heat-labile  (LT)  enterotoxins  that  induce  a  net  secre¬ 
tory  state  leading  to  profuse  watery  diarrhea  (20,  62).  More 
recently,  additional  plasmid-encoded  factors  have  been  impli¬ 
cated  in  the  pathogenesis  of  ETEC,  namely,  the  EatA  serine 
protease  autotransporter  (SPATE)  and  the  EtpA  protein, 
which  acts  as  an  intermediate  in  the  adhesion  between  bacte¬ 
rial  flagella  and  host  cells  (23,  32,  42,  46).  Furthermore,  a 
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TABLE  1.  General  characteristics  of  three  sequenced 
£.  coli  chromosomes 


Characteristic 

Value 

H10407 

K-12 

HS 

Etiology 

Pathogen 

Laboratory  strain 

Commensal 

Length  (bp) 

5,153,435 

4,643,538 

4,686,137 

GC  content  (%) 

50.8 

50.8 

50.8 

Total  no.  of  CDSs 

4,746 

4,384 

4,200 

tRNA  genes 

87 

86 

86 

rRNA  genes 

7 

7 

7 

number  of  chromosomal  factors  arc  thought  to  be  involved  in 
virulence,  e.g.,  the  invasin  Tia;  the  TibA  adhesin/invasin;  and 
LeoA,  a  GTPase  with  unknown  function  (14,  21,  22).  E.  coli 
H10407  is  considered  a  prototypical  ETEC  strain;  it  expresses 
colonization  factor  antigen  1  (CFA/I)  and  the  heat-stable  and 
heat  labile  toxins.  Loss  of  a  94.8-kb  plasmid  encoding  CFA/I 
and  a  gene  for  ST  enterotoxin  from  E.  coli  strain  H10407  leads 
to  reduced  ability  to  cause  diarrhea  (17). 

Here,  we  report  the  complete  genome  sequence  and  viru¬ 
lence  factor  repertoire  of  the  prototypical  ETEC  strain 
H10407  and  the  nucleotide  sequence  and  gene  repertoire  of 
the  plasmids  from  ETEC  strain  E1392/75,  and  we  describe  a 
novel  conserved  secretion  system  associated  with  the  se¬ 
quenced  ETEC  strains. 

MATERIALS  AND  METHODS 

Bacterial  strains  and  sequencing.  The  ETEC  O78:Hll:K80  strain  H10407  was 
isolated  from  an  adult  with  cholcra-likc  symptoms  in  the  course  of  an  epidemi¬ 
ologic  study  in  Dacca,  Bangladesh,  prior  to  1973  (19)  and  was  shown  to  cause 
diarrhea  in  adult  volunteers  (6.  17).  The  E.  coli  1110407  isolate  that  was  se¬ 
quenced  was  from  the  Walter  Reed  Army  Institute  of  Research  (WRAIR) 
cGMP  stock  manufactured  in  February  1998  as  lot  0519.  The  whole  genome  was 
sequenced  to  a  depth  of  8x  coverage  from  pUC19  (insert  size.  2.8  to  5  kh)  and 
pMAQlb  (insert  size,  5.5  to  10  kb)  small-insert  libraries.  Sanger  sequencing  was 
carried  out  using  Amcrsham  Big  Dye  (Amcrsham,  United  Kingdom)  terminator 
chemistry  on  ABI3700  sequencing  machines.  End  sequences  from  larger-insert 
plasmid  (pBACc3.6;  20-  to  30-kb  insert  size)  libraries  were  used  as  a  scaffold. 
Sequence  reads  were  assembled  into  contigs  with  Phrap  (P.  Green,  unpublished 
data)  and  finished  using  GAP4,  as  described  previously  (33).  The  plasmids  from 
the  ETEC  06:H16:K15  strain  E1392/75,  which  was  isolated  from  a  patient  in 
Hong  Kong  with  diarrhea,  express  the  CFA/II  (CXI  and  CS3)  colonization 
factors  and  produce  the  ST'  and  LT  toxins  and  were  also  sequenced  using  a 


similar  approach  (7, 50. 60).  Plasmid  DNA  for  ETEC  E1392/75  was  provided  by 
Acambis  United  Kingdom. 

Gene  prediction,  annotation,  and  comparative  analysis.  Annotation  was  car¬ 
ried  out  using  the  genome  viewer  Artemis  (47).  Coding  sequences  were  predicted 
using  the  gene  prediction  programs  Orpheus  (26),  Glimmer2  (11),  and  Glim¬ 
mers  (10)  and  then  manually  curated.  Protein  domains  were  marked  up  using 
Pfam  (48),  and  trausmembraue  domains  and  signal  sequences  were  predicted 
using  TMHMM  and  SignalP.  respectively  (15.  37).  Annotation  was  transferred 
from  previously  annotated  /:.  coli  genomes  to  orthologous  genes  and  manually 
curated.  A  homologue  was  considered  to  be  present  if  a  hit  was  found  with 
>60%  identity  over  at  least  80%  of  the  length  of  the  query  protein.  Regions  of 
difference  (ROD)  and  plasmids  were  annotated  and  curated  manually. 

Nucleotide  sequence  accession  numbers.  The  annotated  genome  sequence  of 
ETEC  H10407  and  the  plasmids  from  ETEC  H10407  and  E1392/75  have  been 
deposited  in  the  EMBL  databases  (accession  number  FN649414  for  the  com¬ 
plete  ETEC  H10407  chromosome;  Tables  1  and  2  list  the  general  features  of  the 
nucleotide  sequences  and  accession  numbers  for  the  plasmids). 


RESULTS  AND  DISCUSSION 

Structure  and  general  features  of  the  ETEC  HI0407  chro¬ 
mosome.  The  ETEC  HI  0407  genome  consists  of  a  circular 
chromosome  of  5,153,435  bp  and  four  plasmids  designated 
pETEC948,  pETEC666,  pETEC58,  and  pETEC52.  The  gen¬ 
eral  features  of  the  ETEC  H10407  chromosome  are  presented 
in  Table  1  and  the  plasmids  in  Table  2.  We  identified  4,746 
protein-coding  genes  (CDSs)  in  the  chromosome,  33  (0.67%) 
of  which  did  not  have  any  match  in  the  database,  while  579 
(11.67%)  encoded  conserved  hypothetical  proteins  with  no 
known  function  and  503  (10.14%)  were  genes  associated  with 
mobile  elements,  such  as  inlegrases  or  transposases,  or  were 
phage  related.  We  have  identified  25  ROD  that  occur  in  the 
ETEC  H 10407  genome  and  are  differentially  distributed 
among  the  other  sequenced  E.  coli  chromosomes  (Fig.  1;  see 
Table  SI  in  the  supplemental  material).  The  combined  size  of 
these  ROD  is  755,359  bp  (14.7%  of  the  chromosome)  and 
includes  nine  prophages,  designated  ETP29,  -33,  -86,  -128, 
-216,  -284,  -295,  -468,  and  -507,  where  the  numeric  designa¬ 
tions  denote  their  approximate  positions  (times  10,000  bp)  on 
the  chromosome.  None  appeared  to  carry  cargo  genes  related 
to  virulence. 

Comparative  genomics  of  the  ETEC  H10407  chromosome. 
Previously,  a  phylogeny  was  constructed  based  on  the  concat¬ 
enated  sequences  of  2,173  genes  that  are  conserved  in  all  E. 
coli  strains  and  in  Escherichia  alhertii  and  Escherichia  ferguso- 


Charactcristic 


TABLE  2.  General  characteristics  of  the  plasmids  from  ETEC  strains  H10407  and  E1392/75 

Value  in  coli: 

H10407  E1392/75 


Plasmid 
Accession  no. 
Size  (bp) 

No.  of  CDSs 
Rep 

Stability  genes 

Insertion 

elements 


pETEC948 

FN649418 

94,797 

115 

RepFIlA 

StbAB,  PsiAB,  SopAB, 
YacAB,  RelE 
IS/,  IS2,  IS.?,  IS66, 
ISM.  IS  100,  IS 629. 

IS 911,  IS  1414. 

IS EclO,  ISEc/2, 

IS S/M.  Tni 


pETEC666 

FN649417 

66,681 

88 

RepFIlA 
StbAB,  PsiAB, 
Mok/Hok 

IS/,  IS2/,  IS66,  IS600, 
IS/2S4.  ISfcrK 


pETEC58  pETEC52  pETECIOIR 
FN649416  FN649415  FN822745 
5,800  5.175  101,857 

7  6  165 

ColE2  ColEl  RepFIlA 

StbAB,  PsiAB, 
CcdAB 

IS/.  IS2.  IS.?,  IS2/, 
IS.?0.  IS66,  ISM, 
IS  100,  IS 629, 
IS6.?0,  IS6.?9, 
ISM/,  IS  1414, 
iSSltdyl 


pETEC746  pETEC557 
FN822748  FN822746 
74,575  55,709 

117  73 

Repll  RepFIB/Repll 
StbAB,  SopAB.  PsiAB 
NikAB 

IS2,  IS/00,  IS/,  IK?0,  IS66. 
IS/S6.  IS/00,  ISM/, 

IS/. 128  IS Shdyl 


pETEC75  pETEC62 
FN822749  FN822747 
7,497  6,222 

9  13 

ColEl  ND" 


IS/00  ISCR2 


"  ND,  not  determined.  pETEC62  has  a  gene  conserved  among  many  small  plasmids  that  is  annotated  as  a  "probable  replication  initiation  protein.”  but  no 
experimental  evidence  exists  for  this  function. 
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FIG.  1.  Circular  representation  of  the  E.  coli  H10407  chromosome.  From  the  outside  in,  the  outer  circle  1  marks  the  positions  of  regions  of 
difference  (mentioned  in  the  text),  including  prophage  (light  pink),  as  well  as  regions  differentially  present  in  other  E.  coli  strains  (blue)  (see  table 
SI  in  the  supplemental  material).  Circle  2  shows  the  sizes  in  bp.  Circles  3  and  4  show  the  positions  of  CDSs  transcribed  in  clockwise  and 
counterclockwise  directions,  respectively.  Genes  in  circles  3  and  4  are  color  coded  according  to  the  functions  ol  their  gene  products:  dark  green, 
membrane  or  surface  structures;  yellow,  central  or  intermediary  metabolism;  cyan,  degradation  of  macromolecules;  red,  information  transfer/cell 
division;  cerise,  degradation  of  small  molecules;  pale  blue,  regulators;  salmon  pink,  pathogenicity  or  adaptation;  black,  energy  metabolism;  orange, 
conserved  hypothetical;  pale  green,  unknown;  brown,  pseudogenes.  Circles  5  and  6  and  circles  9  andlO  show  the  positions  of  E.  coli  H10407  genes 
that  have  orthologues  (by  reciprocal  FASTA  analysis)  in  E.  coli  K-12  MG1655  (blue)  or  E.  coli  042  (green),  respectively.  Circles  7  and  8  and  circles 
11  and  12  show  the  positions  of  genes  unique  to  E.  coli  H10407  compared  to  E.  coli  K-12  MG1655  (red)  or  E.  coli  042  (gray),  respectively.  Circle 
13  shows  a  plot  of  G+C  contents  (in  a  10-kb  window).  Circle  14  shows  a  plot  of  GC  skew  ([G  -  C]/[G  +  C],  in  a  10-kb  window). 


nii,  which  were  included  as  outgroup  sequences  (4).  The  es¬ 
tablished  E.  coli  subgroups  (A,  Bl,  B2,  D,  and  E)  are  all 
monophyletie,  with  the  exception  of  group  D,  which  is  divided 
at  the  root.  In  agreement  with  previous  optical-mapping  ex¬ 
periments  (5),  E.  coli  H10407  is  located  in  the  A  subgroup  with 
the  nonpathogenic  laboratory  strains  E.  coli  K-12  and  C  and 
the  nonpathogenic  commensal  isolate  E.  coli  HS.  The  majority 
of  commensal  strains  of  bacteria  belong  to  the  A  subgroup 
(59). 

Comparison  of  E.  coli  H10407  with  the  closely  related  non¬ 
pathogenic  E.  coli  K-12,  C,  and  HS  strains  revealed  that  these 


chromosomes  are  largely  colinear  (see  Fig.  SI  in  the  supple¬ 
mental  material)  and  that  the  E.  coli  H10407  chromosome 
contains  599  CDSs  not  present  in  the  nonpathogenic  strains 
(Fig.  2;  sec  Table  S2  in  the  supplemental  material).  The  ma¬ 
jority  (528)  of  these  are  clustered  in  the  25  ROD  and  are 
predicted  to  represent  prophage  genes  and  other  mobility  fac¬ 
tors.  Several  genes  comprise  previously  described  loci  specifi¬ 
cally  associated  with  ETEC  virulence,  viz.,  leoA  (ROD  20),  lia 
(ROD  20),  and  lib  (ROD  13)  (13,  14,  22).  Other  genes  com¬ 
prise  loci  previously  noted  in  ETEC  H10407,  including  the 
degenerate  ETT2  locus  (ROD  18)  (45),  antigen  43  (ROD  23) 
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(B) 


FIG.  2.  Comparison  of  the  genetic  contents  of  the  E.  coli  H10407 
chromosome  with  those  of  the  chromosomes  of  other  sequenced 
strains  of  E.  coli.  (A)  Comparison  of  E.  coli  HI  0407  with  the  three 
nonpathogenic  E.  coli  strains  HS,  C,  and  K-12  revealed  that  the  four 
strains  share  a  large  proportion  of  common  genes.  Only  599  E.  coli 
H10407-specific  genes  were  identified.  The  E.  coli  H10407-specific 
CDSs  are  not  thought  to  be  associated  with  virulence  (see  the  text  for 
details).  (B)  Comparison  of  E.  coli  H10407  with  the  genome-se¬ 
quenced  ETEC  strains  E24377A  and  B7A.  The  four  strains  possess 
3,553  genes  in  common;  however,  the  ETEC  strains  share  only  188 
genes  not  present  in  the  commensal  strain  E.  coli  HS.  The  latter  genes 
are  not  unique  to  ETEC;  they  arc  widely  distributed  among  E.  coli 
strains  and  are  largely  present  among  nonpathogenic  strains  of  E.  coli, 
such  as  E.  coli  K-12. 


(63),  a  type  2  protein  secretion  locus  found  in  many  strains  of 
E.  coli  (ROD  19)  (4),  and  the  ecpP  fimbrial  gene  cluster  also 
found  in  many  E.  coli  strains  (ROD  1)  (4).  Other  ROD  encode 
the  Sil/Pco  efflux  system  that  confers  silver/copper  resistance 
(ROD  2)  and  yersiniabactin  (ROD  11)  and  comprise  the  078 
serotype  O  antigen  biosynthetic  locus  (ROD  14).  The  sil 
operon  is  closely  related  to  sil  from  the  IncH2  plasmid 
pMGlOl  (30,  38,  53)  and  is  adjacent  to  a  partially  interrupted 
copper  resistance  operon  similar  to  pco  from  plasmid  pRJ1004 
(2).  The  sil-pco  locus  is  flanked  by  insertion  sequence  (IS) 
elements  and  phage-related  sequences,  suggesting  horizontal 
transfer  of  these  genes.  The  yersiniabactin  iron  acquisition 
locus  is  widely  distributed  in  E.  coli  and  other  members  of  the 


Enterobacteriaceae  (49).  The  remaining  E.  coli  H10407-specific 
CDSs,  which  are  not  present  on  a  ROD  and  do  not  encode 
prophage  or  mobility  factors,  encode  the  Hll  flagellin  subunits 
(CDS  2029  to  2033)  and  an  additional  copy  of  antigen  43  (CDS 
2119)  and  comprise  several  pseudogenes  (CDS  427,  1476,  and 
1573).  These  data  largely  agree  with  previously  published  sub- 
tractive-hybridization  studies  (5). 

If  a  particular  protein  plays  an  important  role  in  ETEC- 
mediated  disease,  then  one  would  expect  the  gene  encoding  it 
to  have  a  wide  distribution  among  ETEC  strains.  To  determine 
if  there  were  any  chromosomal  genes  specific  to  ETEC  strains, 
comparisons  were  made  with  E.  coli  strains  E24377A  and  B7A, 
the  only  other  ETEC  strains  for  which  genome  sequence  data 
are  available  (44).  Unlike  E.  coli  H10407,  both  E.  coli  strains 
E24377A  and  B7A  belong  to  the  B1  subgroup  of  the  E.  coli 
phylogeny,  a  subgroup  from  which  many  commensals,  but  also 
a  number  of  pathogens,  are  derived  (4,  59).  Comparison  of  E. 
coli  1110407  with  the  sequenced  ETEC  strain  E24377A  re¬ 
vealed  that  the  chromosomes  are  largely  colinear  (see  Fig.  S2 
in  the  supplemental  material).  The  genome  of  ETEC  B7A  is 
not  finished,  but  experience  with  other  E.  coli  genomes  and 
comparison  of  the  198  finished  ETEC  B7A  contigs  suggest  that 
the  chromosome  is  also  largely  colinear  with  the  other  se¬ 
quenced  ETEC  genomes  (see  Fig.  S2  in  the  supplemental 
material).  Analyses  of  the  gene  contents  of  all  three  strains 
revealed  3,741  genes  conserved  in  all  the  strains,  only  188  of 
which  are  not  present  in  the  commensal  E.  coli  HS  (Fig.  2B; 
see  Table  S3  in  the  supplemental  material).  The  188  genes 
identified  through  this  comparison  included  loci  encoding  xan¬ 
thine  dehydrogenase  (CDSs  0339  to  0343),  the  Mat  fimbriae 
(CDSs  0348  to  0352),  conserved  proteins  with  unknown  func¬ 
tions  (CDSs  0673  to  0678),  a  flavoprotein  electron  transfer 
system  (CDSs  1730  to  1734),  the  colanic  exopolysaccharide 
biosynthetic  machinery  (CDSs  2171  to  2202),  the  Fee  iron 
citrate  uptake  system  (CDSs  3161  to  3166),  a  cellulose  synthase 
system  (CDSs  3776  to  3779),  and  a  putative  sugar  utilization 
system  (CDSs  4145  to  4154),  all  of  which  are  present  in  the 
nonpathogen  E.  coli  K-12  and  are  widely  distributed  among 
other  E.  coli  strains  (data  not  shown).  The  remainder  of  the 
188  genes  encode  prophage  or  other  mobility  factors  that  are 
predicted  to  have  no  role  in  virulence.  Of  the  599  E.  coli 
H10407-restricted  genes  identified  through  comparisons  with 
the  nonpathogenic  E.  coli  strains  mentioned  above  (Fig.  2A), 
47  were  conserved  among  the  three  pathogenic  ETEC  isolates. 
However,  these  genes  were  all  related  to  mobile  elements,  and 
no  putative  virulence  factors  were  identified.  Notably,  no  sig¬ 
nificant  homologues  of  leoA,  tibC,  tibA,  or  tin  were  detected  in 
either  E.  coli  E24377A  or  B7A,  strongly  suggesting  these  genes 
are  not  essential  for  ETEC-mediated  disease.  In  conclusion, 
these  data  agree  with  previous  observations  that  the  chromo¬ 
some  of  E.  coli  H10407  is  most  closely  related  to  those  of 
nonpathogenic  E.  coli  strains  and  that  the  factors  mediating 
diarrhea  are  not  chromosomally  encoded,  indicating  that  the 
essential  virulence  factors  are  encoded  on  the  plasmids  (61). 

Potential  virulence  genes  carried  on  the  ETEC  plasmids. 
Since  chromosomal  comparisons  revealed  that  no  chromo¬ 
somal  CDS  was  unique  to  all  three  ETEC  strains,  we  next 
examined  the  CDSs  present  on  the  four  plasmids  of  ETEC 
H10407.  The  general  characteristics  of  the  plasmids  are 
shown  in  Table  2.  The  two  larger  plasmids  (pETEC948  and 


5826 


CROSSMAN  ET  AL. 


J.  Bacteriol. 


pETEC_80  RepF|IA  pETEC_74  RopF|IA  pETEC_73  Rep|,  pETEC_35  RopFMA  gQ% 


FIG.  3.  Nucleotide  sequence  comparison  of  large  conjugativo-likc  plasmids  from  ETEC  strains,  l’lasmid  sequences  from  each  strain  were 
concatenated  and  compared  using  BLASTn.  BLAST  matches  longer  than  250  bp  are  shown  as  gray  blocks  in  a  comparison  between  plasmids  from 
E24377A  (pETEC_S0,  pETEC_74,  pETEC_73,  and  pETEC_35).  H10407  (pETEC948  and  pETEC666),  E1392/75  (pETEClOlS,  pETEC746,  ttnd 
pETEC557),  and  C921b-1  (pCoo).  The  shading  of  the  gray  blocks  is  proportional  to  the  BLAST  match  (minimum,  80%  nucleotide  identity; 
maximum,  100%  nucleotide  identity).  Each  plasmid  is  denoted  as  a  black  line;  the  identity  of  each  plasmid  is  noted  above  the  line,  and  the  source 
ETEC  strain  from  which  the  plasmids  are  derived  is  given  on  the  left  side  of  the  diagram.  Coding  sequences  are  depicted  by  arrows  and  are  colored 
according  to  known  or  predicted  functions:  blue,  virulence  related;  red,  plasmid-related  protein;  green,  outer  membrane  related  (includes  conjugal 
transfer  loci);  pink,  transposasc/inscrtion  element  related;  light  blue,  regulatory  protein;  orange,  conserved  hypothetical  protein;  uncolored, 
hypothetical  protein.  The  positions  of  genes  encoding  known  or  predicted  virulence-related  proteins  are  denoted  by  white  boxes  containing  the 
gene  names.  In  addition,  the  locus  encoding  the  R64  conjugative  pilus  and  the  variant  PilV  tips  is  also  depicted.  The  putative  origin  of  replication 
associated  with  each  of  the  plasmids  is  highlighted  within  yellow-shaded  boxes.  The  chimeric  nature  of  the  plasmids  is  clearly  visible,  with 
recombination  between  plasmids  a  frequent  occurrence.  The  unlabeled  figure  was  prepared  using  a  custom  script  (M.  J.  Sullivan  and  S.  A.  Bealson, 
unpublished  data). 


pETEC666)  are  reminiscent  of  conjugative  plasmids  that  are 
often  associated  with  the  carriage  of  virulence  factors,  whereas 
the  two  smaller  plasmids  (pETEC58  and  pETEC52)  are  ho¬ 
mologous  to  mobilizable  plasmids  frequently  encountered  in  a 
variety  of  bacterial  species  (24,  34).  The  latter  plasmids  have 
been  shown  to  be  mobilizable  in  the  presence  of  IncF  and 
other  plasmid  transfer  systems  (51).  The  majority  of  the  CDSs 
on  all  four  plasmids  encode  plasmid  maintenance  and  transfer 
functions  or  were  pseudogenes,  genes  with  unknown  functions 
not  predicted  to  be  involved  in  virulence,  and  transmissible 
elements  (Table  2).  An  exhaustive  list  of  the  genetic  content  is 
unwarranted  here,  as  a  complete  annotation  of  the  plasmids  is 
provided  in  the  EMBL  databases.  Nevertheless,  there  are  sev¬ 
eral  noteworthy  CDSs,  described  below,  that  can  be  termed 
“cargo”  genes  that  have  a  known  or  putative  role  in  pathogen¬ 
esis.  Thus,  analyses  revealed  that  E.  coli  H 10407  pETEC948 
possesses  cargo  genes  encoding  the  previously  described  EatA 
SPATE  (eatA),  heat-stable  enterotoxin  STa2  (stal),  CFA/I 
fimbriae  and  associated  regulator  ( cfaABCD ),  and  the  Etp 
two-partner  secretion  system  and  associated  glycosyltrans- 
l'erase  (etpABC)  (Fig.  3)  (18,  23,  42,  66).  Analyses  of  the  E.  coli 
H10407  pETEC666  plasmid  revealed  that  it  contains  the  cargo 
genes  encoding  the  previously  described  heat-stable  entero- 
loxin  STal  (stal)  and  the  two  subunits  of  LT  enterotoxin  (cliA 
and  ellB)  (Fig.  3)  (8,  65).  In  addition,  the  plasmids  contain 
several  loci  not  previously  associated  with  ETEC  strains.  ETEC 
H10407  pETEC948  possesses  genes  comprising  a  type  I  secre¬ 
tion  locus  similar  to  the  dispersin  secretion  locus  ( aatABCDP ) 
described  for  E.  coli  042  (Fig.  4)  (52).  Associated  with  this 
locus  is  a  gene  encoding  CexE,  a  previously  described  secreted 


protein  of  ETEC  (43).  which  bears  homology  to  the  E.  coli  042 
dispersin  protein  (Fig.  4).  Furthermore,  pETEC666  carries 
genes  encoding  a  two-component  sensor  kinase,  here  desig¬ 
nated  eicA  and  etcB  (E.  coli  two-component),  and  a  three-gene 
locus  (here  designated  cor  for  E.  coli  oxidoreductase)  encoding 
a  protein  with  homology  to  cytochrome  6-type  subunit  oxi¬ 
doreductase  protein  (eorA),  a  protein  with  homology  to  an 
oxidoreductase  molybdopterin  binding  domain  protein  ( eorB ), 
and  a  periplasmic  protein  with  unknown  function  ( eorC ).  In 
addition,  ETEC  H10407  pETEC58  encodes  a  putative  deoxy- 
eylidylate  deaminase  (pETEC58_0005). 

As  mentioned  above,  if  a  particular  protein  plays  an  impor¬ 
tant  role  in  ETEC-mediated  disease,  then  one  would  expect  it 
to  have  a  wide  distribution  among  ETEC  strains.  To  determine 
whether  the  genes  encoding  the  putative  and  known  virulence 
factors  of  the  ETEC  H10407  plasmids,  which  we  identified 
above,  were  conserved  among  ETEC  strains,  we  next  examined 
their  prevalence  among  the  available  sequenced  strains.  To  aid 
in  this  process,  we  determined  the  sequences  of  the  plasmids 
from  ETEC  strain  E1392/75.  E.  coli  E1392/75  possesses  five 
plasmids:  three  large  conjugative  plasmids  designated 
pETEC1018,  pETEC746,  and  pETEC557  and  two  mobilizable 
plasmids  termed  pETEC75  and  pETEC62  (Table  2  lists  their 
general  characteristics).  Included  in  the  prevalence  investiga¬ 
tions  were  the  ETEC  strains  E24377A  and  B7A  and  the  plas¬ 
mid  pCoo  from  ETEC  strain  C921b-1,  all  of  which  were  se¬ 
quenced  in  other  projects  (28,  44).  As  the  ETEC  B7A  genome 
is  incomplete  and  no  plasmids  were  resolved  and  pCoo  is  the 
only  plasmid  sequenced  from  ETEC  C921b-1,  we  can  confirm 
only  the  presence  of  genes  among  the  available  DNA  se- 
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FIG.  4.  Comparison  of  the  EAEC  aat-aap  locus  with  the  aat-cexE  loci  of  ETEC  strains.  (A)  The  genetic  organizations  of  the  aat  and  cexE  loci 
are  depicted.  The  level  of  amino  acid  identity  for  each  component  of  the  aai-cexE  system  is  shown;  the  percentages  represent  comparison  with  the 
E.  cali  H10407  orlhologucs.  Orthologucs  arc  colored  coded  for  ease  of  identification.  Genes  that  arc  not  juxtaposed  arc  depicted  with  a  blue  line 
separating  them.  (B)  Amino  acid  sequence  alignment  of  ETEC  CexE  proteins  with  the  EAEC  042  dispersin.  All  three  proteins  possess  a  signal 
sequence  that  is  cleaved  after  the  amino  acid  at  position  21  in  the  alignment.  There  is  limited  conservation  in  the  sequences;  however,  two  cysteine 
residues  that  are  disulfide  bonded  in  dispersin  are  conserved.  Based  on  the  structure  of  dispersin,  the  remainder  of  the  conserved  residues  appear 
to  represent  hydrophobic  core  residues  required  for  structural  integrity  of  the  molecule.  Asterisks  indicate  positions  of  amino  acid  identity;  periods 
and  colons  show  positions  of  low  and  high  amino  acid  similarity. 


quences  and  not  the  absence  of  particular  genes  from  these 
strains.  The  distributions  and  locations  of  the  cargo  genes 
encoding  known  or  putative  virulence  factors  among  the  se¬ 
quenced  ETEC  plasmids  is  depicted  in  Fig.  3  and  is  also  shown 
in  Table  S4  in  the  supplemental  material.  Comparative  analy¬ 
ses  revealed  that,  like  ETEC  H10407,  the  ETEC  strains  E1392/ 
75,  B7A,  and  E24377A  possess  the  ST  and  LT  enterotoxins 
(none  were  identified  for£.  coli  C921b-1,  but  previous  analyses 
showed  that  the  strain  harbors  LT  and  ST)  (54).  The  EtpABC 
two-partner  secretion  system  was  identified  in  ETEC  E1392/75 
and  E24377A.  Homologues  may  exist  in  ETEC  strains  B7A 
and  C921b-1,  but  their  existence  or  nonexistence  in  these 
strains  could  not  be  resolved  due  to  the  lack  of  complete 
sequence  data;  however,  other  studies  have  not  demonstrated 
a  universal  association  of  the  etpABC  locus  with  ETEC  strains 
(23).  Unlike  ETEC  strains  H10407,  E24377A,  and  C921b-1, 
the  autotransporter-encoding  eniA  gene  was  not  present  on  the 
ETEC  E1392/75  plasmids.  A  homologue  annotated  as  EatA  is 
found  in  E.  coli  B7A;  however,  further  analyses  of  this  protein 
revealed  that  it  is  more  closely  related  to  SepA,  a  homologous 
SPATE  protein  from  Shigella  flexneri  (1).  No  equivalents  of 
ETEC  H10407  etcAB  or  eorABC  or  of  the  gene  encoding  the 
putative  deoxycytidylate  deaminase  were  detected  in  any  of  the 
other  ETEC  strains. 

Like  E.  coli  H10407,  the  ETEC  strains  E24377A,  E1392/75, 
and  C921b-1  encode  dispersin-like  proteins  previously  desig¬ 
nated  CexE  (43).  Further  analyses  revealed  that  CexE  is 
present  in  ETEC  strains  27D  and  G427  (two  CFA/I+  strains) 
(43)  and  ETEC  0167:H5,  a  CS6-  and  CS5-encoding  strain  (9). 


For  EAEC,  dispersin  is  secreted  via  the  Aat  type  I  secretion 
system;  associates  noncovalently  with  the  extracellular  face  of 
the  outer  membrane,  preventing  collapse  of  the  AAF/II  fim¬ 
briae  onto  the  bacterial  cell  surface  by  alteration  of  the  surface 
charge;  and  is  required  for  colonization  (31,  40,  52).  Analyses 
of  the  nucleotide  sequences  from  ETEC  strains  B7A, 
E24377A,  and  E1392/75  revealed  the  presence  of  loci  encoding 
type  1  secretion  systems  bearing  striking  homology  to  the  Aat 
dispersin  secretion  system  (Fig.  4).  The  cooccurrence  of  cexE 
genes  with  aat  loci  suggests  that  the  CexE  proteins  are  sub¬ 
strates  for  the  Aat-like  secretion  systems  of  ETEC.  Since, 
plasmid-borne  fimbria!  loci  are  inextricably  linked  to  ETEC- 
mediated  disease  (18),  CexE  may  play  a  role  similar  to  that  of 
dispersin  by  maintaining  the  CFs  in  such  a  manner  that  they 
can  interact  with  epithelial  receptors.  However,  further  studies 
are  required  to  investigate  the  function  and  distribution  of 
CexE  and  to  identify  other  relatives  of  this  protein  not  yet 
recognized. 

As  mentioned  above,  adherence  via  plasmid-encoded  fim- 
brial  systems  is  a  crucial  step  in  ETEC  pathogenesis  (62).  E. 
coli  H10407  pETEC948  possesses  the  CFA/I  chaperone-usher 
system  (Fig.  3).  ETEC  E24377A  possesses  two  chaperone- 
usher  fimbrial  systems  located  on  pETEC_S0  and  pETEC_73. 
encoding  the  CS3  and  CS1  fimbriae,  respectively  (44).  Simi¬ 
larly,  E.  coli  E1392/75  possesses  CS3-  and  CSl-encoding  loci 
on  plasmids  pETEC1018  and  pETEC746,  respectively, 
whereas  pCoo  possesses  the  CS1  cluster,  all  of  which  have  been 
described  previously  (28,  57,  58).  In  addition,  E.  coli  E1392/75 
pETEC557  also  encodes  the  CFA/III-type  IV  fimbria  (29).  To 
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determine  whether  fimbrial  systems  other  than  those  men¬ 
tioned  above  might  play  a  crucial  role  in  ETEC  pathogenesis, 
we  investigated  conservation  of  putative  fimbrial  loci  among 
the  available  E.  coli  sequences.  ETEC  H10407  contains  12 
additional  loci  predicted  to  encode  fimbriae,  all  of  which  are 
chromosomally  located  (see  Table  S5  in  the  supplemental  ma¬ 
terial).  Four  of  these  loci  (mat,  sfm,  ycb ,  and  yde)  contain 
pseudogenes  and  were  considered  nonfunctional.  We  sought 
to  establish  if  E.  coli  H10407  harbored  ETEC-specific  fimbrial 
loci  that  might  not  be  expressed  by  commensal  E.  coli ,  E.  coli 
K-12,  or  enteroaggregative  E.  coli.  The  vast  majority  of  fimbrial 
operons  identified  are  also  located  in  commensal  and  labora¬ 
tory  strains,  with  notable  exceptions.  The  yqi  and  stf-mrf  fim¬ 
brial  loci  are  present  in  E.  coli  H10407  but  contain  pseudo¬ 
genes  in  commensal  or  laboratory  E.  coli  strains.  However,  an 
apparently  functional  yqi  operon  is  also  present  in  enteroag¬ 
gregative  E.  coli  strain  042,  and  thus,  a  functional  y?r  locus  does 
not  appear  to  be  ETEC  specific.  Indeed,  the  yqi  operon  does 
not  appear  to  be  present  in  ETEC  B7A  (4).  With  regard  to  the 
stf-mrf  operon,  the  mrfC  gene  is  a  pseudogene  in  E.  coli  K-12 
but  not  in  ETEC  H10407.  This  six-gene  cluster  (smfA-mrfCD- 
stfEFG)  is  present  in  ETEC  E24377A  and  EAEC  042,  though 
with  some  divergence  in  the  stf  genes. 

Finally,  the  ETEC  E1392/75  pETEC62  plasmid  possesses 
CDSs  encoding  a  type  II  dihydropteroate  synthase  gene  con¬ 
ferring  sulfonamide  resistance  and  CDSs  encoding  streptomy¬ 
cin  phosphotransferase  genes  conferring  streptomycin  resis¬ 
tance.  The  plasmid  possesses  99%  nucleotide  identity  with  the 
ETEC  E24377A  pETEC_6  plasmid  and  shares  high  levels  of 
identity  with  plasmids  from  a  variety  of  E.  coli  strains,  including 
the  Shigella  sonnet  pKKTET7  and  the  EPEC  pE2348-2  plas¬ 
mids  However,  this  plasmid  has  no  homologue  in  ETEC 
H10407  and  no  detectable  homology  among  the  ETEC  B7A 
sequences,  suggesting  it  may  not  be  widespread  among  ETEC 
strains  and  thus  is  not  essential  for  ETEC-mediated  diarrhea. 

In  conclusion,  the  putative  and  known  virulence  genes  iden¬ 
tified  on  the  plasmids  of  E.  coli  H10407  have  differential  dis¬ 
tributions  among  the  sequenced  ETEC  strains.  In  all  cases,  the 
ETEC  strains  possess  genes  encoding  the  ST  and/or  LT  toxins 
(sta  and/or  eltAB,  respectively),  a  chaperone-usher  fimbrial 
biogenesis  locus  (e.g.,  the  cfa  locus),  and  components  of  an 
aat-cexE  dispersin-like  type  I  secretion  system.  Thus,  despite 
the  variation  in  individual  plasmid  gene  contents,  comparison 
of  the  entire  plasmid  complement  of  the  sequenced  ETEC 
strains  suggests  that  there  is  a  conserved  core  of  genes  con¬ 
tained  on  the  plasmids  that  are  predicted  to  be  involved  in 
virulence  and  may  be  essential  for  the  establishment  of  ETEC- 
mediated  disease. 

ETEC  plasmids  demonstrate  a  mosaic  structure.  To  deter¬ 
mine  whether  the  virulence  factors  identified  above  were  en¬ 
coded  on  a  specific  plasmid,  or  repertoire  of  plasmids,  we 
examined  the  nucleotide  sequence  identity  shared  by  the 
ETEC  plasmids.  The.  nucleotide  sequences  of  the  conjugative 
plasmids  from  each  of  the  ETEC  strains  H10407,  E1392/75, 
and  E24377A  were  concatenated  and  compared  using 
BLASTn.  The  levels  of  nucleotide  sequence  identity  between 
pCoo  and  the  other  ETEC  plasmids  were  determined  in  a 
similar  manner.  These  comparisons  revealed  that  while  the 
plasmids  all  belong  to  a  narrow  subset  of  incompatibility 
groups  (see  below),  extensive  rearrangements  and  recombina¬ 


tion  events  have  occurred,  resulting  in  individual  plasmids  that 
vary  in  their  repertoires  of  virulence  genes  (Fig.  3;  see  Table  S4 
in  the  supplemental  material).  Such  recombination  can  be  seen 
by  examining  the  distribution  of  the  eatA  gene.  Thus,  the  eatA 
gene  is  not  present  in  ETEC  strain  1392/75,  and  in  ETEC 
strain  E24377A,  the  eatA  gene  is  located  on  pETEC_74  and 
the  eltAB,  aatPABC,  and  etpABC  loci  are  located  on 
pETEC_80.  In  contrast,  in  ETEC  strain  H10407,  the  eatA  gene 
is  collocated  with  etpABC  and  aatPABC  on  pETEC948, 
whereas  the  eltAB  locus  is  located  on  pETEC666.  The  eatA 
gene  is  present  on  ETEC  C921b-1  pCoo,  along  with  cooABCD ; 
however,  in  ETEC  strain  E24377A,  cooABCD  is  located  on  a 
separate  plasmid  (pETEC_73)  (Fig.  3;  see  Table  S4  in  the 
supplemental  material).  Other  virulence-associated  genes  also 
display  such  differential  distributions  (see  Table  S4  in  the  sup¬ 
plemental  material),  suggesting  that  the  extrachromosomal 
components  of  the  ETEC  genome  are  in  a  state  of  flux  (34, 44). 
Notably,  the  plasmids  contain  an  extensive  repertoire  of  IS 
elements  and  transposons  (Table  2)  (34);  it  is  likely  that  the 
mobility  of  these  genetic  elements,  or  recombination  between 
the  elements,  gives  rise  to  the  observed  mosaic  structure  of  the 
ETEC  plasmids. 

Similar  comparisons  of  the  small  mobilizable  plasmids  of  the 
ETEC  strains  did  not  demonstrate  recombination  between  the 
mobilizable  plasmids.  Furthermore,  there  did  not  appear  to  be 
any  significant  exchange  of  genetic  material  between  the  con¬ 
jugative  plasmids  and  the  small  mobilizable  plasmids  (data  not 
shown). 

Plasmid  stability  and  maintenance  functions  of  the  ETEC 
plasmids.  To  determine  whether  the  virulence  factors  de¬ 
scribed  above  were  encoded  on  self-transmissible  plasmids,  we 
examined  the  CDSs  encoding  the  plasmid  maintenance  and 
transfer  functions  of  each  ETEC  plasmid.  A  complete  descrip¬ 
tion  of£.  coli  H10407  pETEC666  has  been  published  (41),  and 
the  complete  repertoire  of  genes  for  each  ETEC  plasmid  are 
given  in  the  EMBL  databases  (see  Table  2  for  accession  num¬ 
bers);  thus,  only  the  most  salient  features  are  described  here. 
Plasmid  nomenclature  utilizes  a  system  based  on  incompatibil¬ 
ity  groupings;  plasmids  of  the  same  incompatibility  group 
should  not  coexist  within  the  same  bacterial  cell  because  of  the 
similarity  of  their  replication  systems  (34).  However,  sequence 
analyses  of  the  CDSs  encoding  the  plasmid  replication  func¬ 
tions  of  the  repertoire  of  ETEC  plasmids  revealed  that  the 
large  conjugative-like  plasmids  of  E.  coli  strains  H10407, 
El 392/75,  and  E24377A  belong  to  a  narrow  subset  of  incom¬ 
patibility  groups  and  comprise  multiple  plasmids  with  the  same 
replication  mechanism  (Fig.  3  and  Table  2).  Thus,  the  E.  coli 
H10407  plasmids  pETEC948  and  pETEC666  belong  to  the 
RepFIIA  (IncFIIA)  subset  of  incompatibility  groupings  and 
have  RepAl  proteins  that  share  94%  identity  (95%  similarity), 
whereas  the  E.  coli  E1392/75  plasmids  pETEC746  and 
pETEC557  harbor  Repll  (Incll)  replication  functions  (E.  coli 
E1392/75  pETEC557  is  an  apparent  cointegrate  of  RepFIB 
and  Repll  plasmids;  such  cointegration  has  previously  been 
noted  for  E.  coli  C921b-1,  where  pCoo  represents  a  cointegrate 
between  a  RepFIIA  and  a  Repll  plasmid  [28]),  with  the  cor¬ 
responding  RepZ  proteins  sharing  94%  identity  (95%  similar¬ 
ity).  Similarly,  the  previously  described  ETEC  strain  E24377A 
(44)  possesses  three  plasmids  with  RepFIIA  functions.  The 
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basis  for  these  antidogmatic  observations  is  not  understood 
and  requires  further  in-depth  investigation. 

Analyses  of  the  nucleotide  sequences  of  the  repertoire  of 
large  conjugative-like  plasmids  revealed  that  they  possess  a 
number  of  plasmid  stability  systems,  including  postsegregation 
killing  systems  and  active-partitioning  systems.  The  distribu¬ 
tion  of  these  systems  among  the  plasmids  sequenced  in  this 
study  is  given  in  Table  2.  These  stability  systems  have  been 
described  previously  (25,  56). 

Previous  studies  have  noted  that  the  large  plasmids  encoding 
the  toxins  of  ETEC  are  in  some  cases  self-transmissible  and  in 
other  cases  not  transmissible  (27).  To  investigate  whether  the 
plasmids  sequenced  in  this  study  possessed  transmissibility 
functions,  we  examined  the  transfer  regions  of  the  conjugative- 
like  plasmids.  As  noted  previously,  E.  coli  H10407  pETEC666 
has  a  transfer  region  that  is  interrupted  by  several  IS cES  ele¬ 
ments,  severely  diminishing  the  ability  of  the  system  to  func¬ 
tion  ellieiently  (41).  In  contrast,  E.  coli  1110407  pETEC948 
possesses  only  remnants  of  the  conjugation  apparatus  and  is 
presumably  not  self-transmissible.  In  addition,  the  E.  coli 
E1392/75  pETEClOlS  plasmid  also  contains  an  incomplete 
conjugation  apparatus,  which  is  presumed  to  be  ineffective  at 
promoting  conjugation;  however,  E.  coli  E1392/75  pETEC746 
possess  an  intact  conjugation  system  that  is  100%  identical  to 
the  region  encoding  the  functional  R64-like  conjugative  pilus 
of  pCoo  of  E.  coli  C921b-1,  and  thus,  it  is  presumed  to  be 
functional.  E.  coli  E1392/75  pETEC557  lacks  CDSs  encoding 
the  R64  conjugative  pilus  and  possesses  remnants  of  an  F-like 
conjugation  system. 

ETEC  strains  H10407,  E1392/75,  and  E24377A  all  contain 
similar  small  mobilizable  plasmids  (pETEC52.  pETEC75,  and 
pETEC_5,  respectively)  with  mob  and  rep  regions  displaying 
100%  identity.  The  E.  coli  E1392/75  pETEC75  plasmid  con¬ 
tains  an  IS  100  element  not  present  in  the  other  two  plasmids. 
The  distribution  of  these  plasmid  types  among  the  sequenced 
ETEC  strains  suggests  that  they  might  be  common  compo¬ 
nents  of  ETEC  genomes.  This  plasmid  type  has  been  found  in 
a  number  of  other  E.  coli  strains  and  has  been  shown  to 
increase  the  fitness  of  certain  E.  coli  host  strains  (16). 
Therefore,  multiple  seleelive  advantages  might  be  conferred 
on  the  ETEC  strains  possessing  these  small  plasmids.  The 
rep  and  mob  regions  (3,058  bp)  of  the  ETEC  HI 0407  pE- 
TEC5S  plasmid,  which  encodes  the  putative  deoxycytidylate 
deaminase,  demonstrates  81%  identity  with  plasmid  pHW66 
from  Rahnella  sp.  strain  WMR66;  the  putative  deoxycytidy¬ 
late  deaminase  is  lacking  in  pHW66.  In  contrast  to  the  other 
ETEC  plasmids,  there  are  no  plasmids  homologous  to 
ETEC  H10407  pETEC58  among  the  other  genome-se¬ 
quenced  ETEC  isolates. 

The  E.  coli  E1392/75  pETEC74fi  plasmid  contains  a  pilin 
shufflon.  As  mentioned  above,  ETEC  E1392/75  pETEC746 
contains  regions  homologous  to  the  Salmonella  enterica  sero- 
var  Typhimurium  Repll  plasmid  R64  that  are  also  present  in 
E.  coli  C921b-1  pCoo  and  that  have  been  shown  to  be  func¬ 
tional  in  that  system  (28).  As  sequencing  of  the  ETEC  genome 
was  being  completed,  dideoxy  sequencing  of  the  region  from 
bp  56253  to  59961  of  pETEC746  from  E.  coli  E1392/75  iden¬ 
tified  a  nucleotide  region  undergoing  dynamic  alteration.  The 
region  of  DNA  consisted  of  a  shufflon  similar  to  that  of  R64 
(36).  PilV  is  a  component  of  a  conjugative  pilus  that  expresses 
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FIG.  5.  Arrangement  of  the  pilV  shufHon  region  of  E.  coli  E1392/75 
pETEC746.  Annotation  of  the  pilV  region  is  shown  using  the  Artemis 
sequence  viewer  (1).  Sequence  blocks  encoding  C-terminal  fragments 
of  PilV  are  found  in  both  orientations  between  pilV  and  the  rci  recom- 
binase  gene.  Identical  13-bp  repeats  (GTGCCAATCCGGT)  arc 
shown  as  miscellaneous  features  and  mark  the  predicted  sites  of  re¬ 
combination  between  the  C-terminal  fragments  and  the  pilV  gene. 


different  tips  involved  with  attachment  to  cells.  The  tips  are 
regulated  via  a  DNA  shufflon  mechanism  involving  recombi¬ 
nation  at  particular  repealing  sites.  Recombination  is  mediated 
by  the  rci  recombinase  linked  to  this  region.  Alternative  tip 
adhesins  are  involved  in  attachment  to  different  strains  and 
species  and  have  been  elucidated  experimentally  in  S.  Typhi¬ 
murium  (36).  Evidence  that  the  shufflon  is  functional  in  the  E. 
coli  E1392/75  plasmid  pETEC746  is  provided  in  the  sequences 
from  a  small-insert  library.  Within  the  sequences  are  examples 
of  pilV  with  alternative  C-terminal  tips,  implying  that  the  plas¬ 
mids  sequenced  represented  a  population  in  genetic  flux. 
There  is  direct  evidence  for  sequences  of  pilV  with  tips  VI,  VS, 
and  V4  (Fig.  5).  There  are  also  regions  of  DNA  sequence 
equivalent  to  tips  sliuCl,  shuC',  and  sliuC2  from  S.  Typhi¬ 
murium.  However,  these  were  present  only  in  a  small  subpopu¬ 
lation  of  pETEC746  plasmids  and  have  been  omitted  from  the 
complete  finished  sequence. 

Conclusions.  This  study  provides  a  genomic  context  for  the 
vast  amount  of  experimental  and  epidemiological  data  pub¬ 
lished  thus  far  and  provides  a  template  for  future  diagnostic 
and  intervention  strategies.  Evidence  presented  here  suggests 
that  the  prototypical  ETEC  isolate  E.  coli  HI 0407  was  a  com¬ 
mensal  isolate  that  acquired  a  number  of  plasmids  containing 
a  limited  repertoire  of  virulence  genes  and  thereby  gained  the 
ability  to  cause  disease.  Furthermore,  comparisons  of  the  ge¬ 
netic  content  of  E.  coli  H10407  with  those  of  other  ETEC 
strains  has  revealed  only  a  limited  number  of  conserved  genes, 
suggesting  that  to  become  pathogenic,  E.  coli  need  only  ac¬ 
quire  (i)  toxins  (ST,  LT,  or  both)  to  elicit  net  secretion  from 
enterocytes;  (ii)  a  fimbrial  system  that  mediates  attachment  to 
the  intestinal  epithelium,  e.g..  CFA/I;  and  (iii)  a  novel  type  I 
secretion  system,  the  substrate  of  which  (CexE)  maintains  the 
fimbriae  in  the  correct  physical  organization.  These  data  sug¬ 
gest  that  ETEC  vaccine  strategies  should  focus  on  these  plas¬ 
mid-encoded  virulence  factors.  However,  given  the  relative 
plasticity  of  the  E.  coli  genome,  molecular  epidemiological 
studies  are  essential  to  determine  whether  these  factors  are 
widely  distributed  among  ETEC  strains  from  geographically 
diverse  locations. 
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